A basic concept in (statistical) programming is called a variable.
A variable allows you to store a value (e.g. 4) or an object (e.g. a function description) in R. You can then later use this variable’s name to easily access the value or the object that is stored within this variable.
Save information as an R objetc with the greater than sign followed by a minus, e.g. an arrow: <-
#name of new objetc assignment operator, "gets" information to store in the objetc
foo <- 42
Save output of one function as an R objetc to use in a second function
foo
[1] 4
factorial(foo)
[1] 24
You can remove an objetc with rm
fac_foo
[1] 24
rm(foo)
rm(fac_foo)
mean
[1] 0.1731301
pi
[1] 3.141593
You can save more than a single number in an objetc by creating a vector, matrix, or array.
class(WorldPhones)
[1] "matrix"
Combine multiple elements into one dimensional array.
Create with the c function.
vec
[1] 1 2 3 10 100
Combine multiple elements into a two dimensional array.
Create with the matrix function.
mat
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
mat
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 2 3 4 5 6 1
vec; vec2
[1] 1 2 3 10 100
[1] 5 6 7 14 104
vec * 4 ; vec2 * 4
[1] 4 8 12 40 400
[1] 20 24 28 56 416
vec * vec ; vec2 * vec2; c(23,vec) * c(vec2,2);vec;vec2
[1] 1 4 9 100 10000
[1] 25 36 49 196 10816
[1] 115 6 14 42 1040 200
[1] 1 2 3 10 100
[1] 5 6 7 14 104
inner
vec; vec %*% vec; mat;mat %*% mat
[1] 1 2 3 10 100
[,1]
[1,] 10114
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[,1] [,2] [,3]
[1,] 30 66 102
[2,] 36 81 126
[3,] 42 96 150
outer
vec; vec %o% vec; mat; mat %o% mat; mat %o% vec
[1] 1 2 3 10 100
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 10 100
[2,] 2 4 6 20 200
[3,] 3 6 9 30 300
[4,] 10 20 30 100 1000
[5,] 100 200 300 1000 10000
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 1, 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2, 1
[,1] [,2]
[1,] 2 8
[2,] 4 10
[3,] 6 12
, , 3, 1
[,1] [,2]
[1,] 3 12
[2,] 6 15
[3,] 9 18
, , 1, 2
[,1] [,2]
[1,] 4 16
[2,] 8 20
[3,] 12 24
, , 2, 2
[,1] [,2]
[1,] 5 20
[2,] 10 25
[3,] 15 30
, , 3, 2
[,1] [,2]
[1,] 6 24
[2,] 12 30
[3,] 18 36
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 2 8
[2,] 4 10
[3,] 6 12
, , 3
[,1] [,2]
[1,] 3 12
[2,] 6 15
[3,] 9 18
, , 4
[,1] [,2]
[1,] 10 40
[2,] 20 50
[3,] 30 60
, , 5
[,1] [,2]
[1,] 100 400
[2,] 200 500
[3,] 300 600
transpose
mat
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
t(mat)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
R can recognize different types of data.
We’ll look at four basic types:
12+4
[1] 16
3000000
[1] 3e+06
class(0.000001)
[1] "numeric"
print('hello')
[1] "hello"
class("hello")
[1] "character"
"12+4"
[1] "12+4"
class("12+4")
[1] "character"
"hello" + "world"
Error in "hello" + "world" : non-numeric argument to binary operator
nchar("hello")
[1] 5
paste("hello","world",sep=",");paste("hello","world",sep=" 2342eafdsghsIJGBJmdxfghvb ")
[1] "hello,world"
[1] "hello 2342eafdsghsIJGBJmdxfghvb world"
paste(paste("hello","world",sep=","),paste("como","estas",sep="_"),2,sep="::"); paste(paste("hello","world",sep=","),paste("como","estas",sep="_"),"2",sep="::"); paste("hola","sin","espacios",sep="");paste0("hola","sin","espacios")
[1] "hello,world::como_estas::2"
[1] "hello,world::como_estas::2"
[1] "holasinespacios"
[1] "holasinespacios"
wich are numbers?
1; "1"; "one"
[1] 1
[1] "1"
[1] "one"
c(1, "1","one")
[1] "1" "1" "one"
TRUE or FALSE - R’s form of binary data. - Useful for logical tests. - And Very Useful when whe want to filter datasets…
3<4
[1] TRUE
x <- c(1, 2, 3, 4, 5)
x
[1] 1 2 3 4 5
x > 3
[1] FALSE FALSE FALSE TRUE TRUE
x >= 3
[1] FALSE FALSE TRUE TRUE TRUE
x < 3
[1] TRUE TRUE FALSE FALSE FALSE
x <= 3
[1] TRUE TRUE TRUE FALSE FALSE
x == 3
[1] FALSE FALSE TRUE FALSE FALSE
x != 3
[1] TRUE TRUE FALSE TRUE TRUE
x = 3
c(3,4,5,6) %in% c(2, 3, 4)
[1] TRUE TRUE FALSE FALSE
unique(titanic2$age)
[1] adult child
Levels: adult child
titanic2%>%
filter(fate!="survived" &
class%in%c("1st","3rd"))
class(TRUE)
[1] "logical"
class(T) ; class(F)
[1] "logical"
[1] "logical"
class(3<4)
[1] "logical"
R’s form of categorical data. Saved as an integer with a set of labels (e.g. levels)
fac<-factor(c("a","b","c"))
fac
[1] a b c
Levels: a b c
class(fac)
[1] "factor"
One proof that factor makes sense
<-. Alternatively, you can use =, but <- is widely preferred in the R community.# Add these two variables together
my_apples + my_oranges
[1] 11
Be careful with the operations between different types/classes of objects
# Assign a value to the variable my_apples
my_apples <- 5
# Fix the assignment of my_oranges
my_oranges <- "six"
# Create the variable my_fruit and print it out
my_fruit <- my_apples + my_oranges
Error in my_apples + my_oranges : non-numeric argument to binary operator
class(my_oranges)
[1] "character"
class(my_oranges)
[1] "character"
So, in general, it’s a good idea to check that the objetcs that are opperating between each other, are of the same class/type or we have to be conscients that sometimes, if the types are not equals but they are “almost operables”, R will change at least one of them to a type that make both be “totaly operables”.
There could be some warnings about this… it could be a good idea to knoe a little more about the data types that will be jumping in at our work.
On a Vector…
vec<-c(1,"R","TRUE")
class(vec)
[1] "character"
vec
[1] "1" "R" "TRUE"
Sure a Matrix will do it…
matriz_de_Camilo<-matrix(cbind(c(1,2,3),
c("R","S","T"),
c(TRUE,FALSE,TRUE)),ncol=3)
class(matriz_de_Camilo)
[1] "matrix"
matriz_de_Camilo
[,1] [,2] [,3]
[1,] "1" "R" "TRUE"
[2,] "2" "S" "FALSE"
[3,] "3" "T" "TRUE"
for(row_tmp in 1:nrow(matriz_de_Camilo)){
print(class(matriz_de_Camilo[row_tmp,]))
}
[1] "character"
[1] "character"
[1] "character"
for(col_tmp in 1:ncol(matriz_de_Camilo)){
print(class(matriz_de_Camilo[,col_tmp]))
}
[1] "character"
[1] "character"
[1] "character"
matriz_de_Camilo
[,1] [,2] [,3]
[1,] "1" "R" "TRUE"
[2,] "2" "S" "FALSE"
[3,] "3" "T" "TRUE"
What the … is R doing?!
Always remember Coercion
So, isn’t there any way we I can do it?
Really? ;(
There is a way… Thank God for the data frames…
And for the lists…
When we read a .csv file and store it on a object, that will be a data.frame class
class(titanic2)
[1] "data.frame"
And now, just because it’s worthy…
There are some types of objects very similar to the data frames but that are not exactly one of those
They came from the package dplyr (one of my favorites) and its class is called tibble (nickname: data_frame) instead of data.frame
Example:
class(flights)
[1] "tbl_df" "tbl" "data.frame"
Print on console:
titanic2 &, afterwardsflights
Do you see any difference?
nlst
$one
[1] 1
$two
[1] 2
$many
[1] 1 2 3
nlst<-list("Eduardo"=df_de_Camilo,"Camilo"=matriz_de_Camilo,"Carlos"=c(T,FALSE,TRUE,F))
#Print directly on console
nlst
$Eduardo
$Camilo
[,1] [,2] [,3]
[1,] "1" "R" "TRUE"
[2,] "2" "S" "FALSE"
[3,] "3" "T" "TRUE"
$Carlos
[1] TRUE FALSE TRUE FALSE
nlst$Eduardo
#Print directly on console
nlst[1]
$Eduardo
NA
#Print directly on console
nlst[[1]]
nlst02<-list("Eduardo"=df_de_Camilo,"Camilo"=matriz_de_Camilo,"Carlos"=c(T,FALSE,TRUE,F),"unalistadentrodeunalista"=nlst)
#Print directly on console
nlst02
$Eduardo
$Camilo
[,1] [,2] [,3]
[1,] "1" "R" "TRUE"
[2,] "2" "S" "FALSE"
[3,] "3" "T" "TRUE"
$Carlos
[1] TRUE FALSE TRUE FALSE
$unalistadentrodeunalista
$unalistadentrodeunalista$Eduardo
$unalistadentrodeunalista$Camilo
[,1] [,2] [,3]
[1,] "1" "R" "TRUE"
[2,] "2" "S" "FALSE"
[3,] "3" "T" "TRUE"
$unalistadentrodeunalista$Carlos
[1] TRUE FALSE TRUE FALSE